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using samples of quasars 

Ruobing Dong^, James Gunn^, Gillian Knapp^, Constance Rockosi^, and Michael Blanton^ 



O 
(N 



(N 

< 

6 



> 

m 

o 



ABSTRACT 

We investigate in detail the probability distribution function (pdf) of the 
prope r-motion measurena ent errors in the SDSS+USNO-B proper-motion cata- 
log of iMunn et al.l (120041 ) using clean quasar samples. The pdf of the errors is 
well-represented by a Gaussian core with extended wings, plus a very small frac- 
tion (< 0.1%) of "outliers". We find while formally the pdf could be well-fit by 
a five-parameter fitting function, for many purposes it is also adequately to rep- 
resent the pdf with a one-parameter approximation to this function. We apply 
this pdf to the calculation of the confidence intervals on the true proper motion 
for a SDSS+USNO-B proper motion measurement, and discuss several scientific 
applications of the SDSS proper motion catalogue. Our results have various ap- 
plications in studies of the galactic structure and stellar kinematics. Specifically, 
they are crucial for searching hyper-velocity stars in the Galaxy. 

Subject headings: astrometry, proper-motions, catalogs 



1. Introduction 



% 



As of the eighth Data Release (hereinafter DRn, where n is the release number) of 
the Sloan Digital Sky Survey (here inafter SDSS) the survey has released imaging for 14,555 



deg^, or over a third of the skv (IGunn et al.l Il998t lYork et al.l l2000l : iLupton et al.l 12001 



Stoughton et al.l l2002l : ISDSS-III collaboration: Aihara et al.l 1201 ll ). The imaging catalog 
contains almost half a billion distinct detected objects down to a 50% completeness limit of 
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r = 22.5 for point sources, and the survey's 1.8 million catalo gued spectra provide several well 

defined samples of galaxies, quasars, stars, and other objects (ISDSS-III collaboration: Aihara et al 
201lh . 



Along with the photomet ric and spectroscopic data, SDSS also provides proper-motion 
(PM) data (JMunn et al.ll2004j . hereinafter Munn et al. catalog), produced by matching the 
SDSS point source detections with earlier observations, including the USNO reductions of the 
Palomar Observator y Sky Surveys (POSS-I and POSS-II), which span about 50 yr in time 
(JMonet et al. 1120031 ). In this catalog, the USNO proper-motion system is re-calibrated and 
made absolute using SDSS galaxies, and these proper motions (called here SDSS+USNO-B 
PM) are computed including both SDSS and USNO-B positions. The resultant catalog is 
90% complete to g < 19.7, and has a less than 0.5% contamination rate. The systematic 
errors are on the order of 0. 1 mas yr~^, and the statistical errors are roughly 3-4 mas yr~^ in 



each component of the PM. iMunn et al.l (120041 ) compared their resul t s with those of the re - 
vised New Luyteri Two- Tenths catalog (rNLTT: iGould fc Salimll2003l : ISalim fc Gouldll2003l ). 
Bond et al.l (120101 ) carried out a further comparison of the SDSS-I -USBO-B prope r motions 
with those of a sample of stars in the North Galactic Pole region (lMaiewskilll992l) and wit h 
proper motion measurements made using data from SDSS Stripe 82 (JBramich et al.ll2008l ). 
These independent measurements are expected to have different systematic errors from the 
SDSS-I-USNO-B PM. The results show that the median differences and the rms scatter of 
these comparisons agree with expectation and that the SDSS-I-USNO-B PM measurements 
are reliable at roughly the stated errors. 

Proper-motion measurement at this level of accuracy are useful in several ways. Samples 
of objects can be defined using the reduced proper motion diagram to separate classes of 
objects with intrinsically similar colors, spectra and apparent magnitudes but very different 
proper motion distributions due, for example, to different luminosities or different kinemat- 
ics. This is used in several target selection algorithms in the S PSS projects, most notably 
the Sloan Extension for Galactic Understanding and Evolution (lYannv et al.ll2009l. SEGUE 



I and II), both to help find very nearby objects (with high PM) (e.g. iLepine &: Scholzll2008l ) 
and distant giants (with low PM). A variation of this method has been explored by defining 
a "reduced-proper-motion" (for example in the r-ban d the reduced-proper-motion would be 
defined as trpm = r + 5 log PM, ISalim fc Gouldll2003l ). while the reduced proper motion dia- 
grams are not as useful at faint mag nitudes probed by SDSS as they are at the (traditionally 
used) bright end (ISesar et al.l 120081 ). Most importantly, PM, along with even crude distance 
determinations for stars in the Galaxy, provide two-dimensional velocity information; if in 
addition radial velocity measurements are available, one has full three-dimensional velocity 
information. A one-sigma accuracy of 3 mas yr~^ corresponds to a transverse velocity ac- 
curacy of about 15 km/s at 1 kpc and 150 km/s at 10 kpc. Thus the PM measurements 
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are statistically very useful for studying the kinematics of the thick disk and halo, for which 
the velocity dispersions are of this order at these distances — provided one understands the 
errors well. Recent years have witnessed an increased interest in studies of stellar kinematics 
and Galactic structure, spurred both by the increasing sophistication of galaxy formation 
simulations and the availability of the large photometric sample of stars in SDSS together 
with a significant subsample with radial velocities and chemical composition information 
obtained as part of its SEGUE subs urveys. With the SDSS+USNO-B PM measurement and 
radial v elocities, iBond et al.l (|2010[ ) selected a sample of main-sequence stars with r < 20, 



while CaroUo et al 



( I2OIOI ) selected a subsample of SDSS cahbration stars (mostly metal- 
poor turnoff F st ars) to study the kinematics of both the galactic thin and th ick disks and 
the halo (see also lSmith et al.l 120091 : ISchlaufman et al.ll2009l : iFuchs et al.ll2009l ). Many more 



studies of this sort are underway using the SEGUE II data. 

To do this job really well clearly requires detailed knowledge of the distribution of the 
proper motion, distance and radial velocity errors, since for most of these applications the 
proper-motion related velocity errors are of the same order as the velocities themselves. 
Furthermore, in most cases the error in the tangential velocity due to the proper motion 
and distance error is the dominant contributor to the total velocity error. In addition, if one 
is interested (and one usually is) in extreme velocities, understanding the behavior of the 
distribution in the tails of the pdf is crucial. We approach this problem in the way others 
have done in the past, but with the goal from the outset of understanding the pdf in as much 
detail as the sample sizes and systematics will allow. 

Quasars are sufficiently distant that they should not have any measurable PM in SDSS. 
Thus, the measured PM of quasars are just the PM measurement error. By studying the 
PM of a large sample of quasars, we can probe the statistical properties of the PM errors, 
and study the dependence on various observational and instrumental parameters. Although 
the spectral energy distributions for quasars differ from those of the point sources for which 
PM is of interest (stars), the major contributors to the systematic error in the PM mea- 
surement, including: the difficulty in centroiding sources on photographic plates; errors due 
to unresolved or partially resolved projected nearby o bjects: errors in the p rimary refer- 



ence catalogs; systematic errors in the UCAC catalog (jZacharias et al. 



2OIOI): and charge 



transfer effects in the SDSS astrometric and photometric detectors (JMunn et al.ll2004l ). work 
in essentially the same way for all point sour ces. Thus the error distribution for all point 
sources will be very similar ( jBond et al.ll2010l but caution is still needed when applying the 
results based on quasars to stars, see Sectio n E] and \5\ for details). Along with the publi- 
cation of the PM cata log, iMunn et al.l (|2004J ) used spectroscopic quasars in the SDSS DRl 
( ISchneider et al.ll2002l ) to study the mean and variance of the SDSS+USNO-B PM error, but 
due to the limited sample size were not able to study the dependence on magnitude. With 



-4- 



a much larger spectroscopic quasar sample in SDSS DR7, iBond et al.l ( I2OIOI ) addressed this 
issue again and presented the width o in the Gaussian distribution as a function of r-band 
magnitude. They assumed the error distribution was Gaussian and did not investigate the 
form of the distribution function; in particular, they did not try to characterize the wings of 
the distribution. 

In this work, we an alyze the full error distribution of the SDSS+USNO-B PM (with the 
corrections presented by lMunn et al.ll2008l . which corrected an error in the calculation of the 
proper motions in right ascension). We do this using clean quasar samples which we define 
in such a way that corresponding clean stellar samples are easily constructed. We fit the 
PM error distribution in the entire magnitude range by a Gaussian distribution for the core 
plus a wing function, and derive the dependence of the function parameters on magnitude. 
Using this fitting function, we calculate the significance of a given PM measurement. We also 
quantify (or at least place upper limits on) the fraction of the outliers in the PM measurement 
which survive the cleaning process and for one reason or another clearly do not belong to 
the main distribution. We then discuss various issues which involve applying the analysis of 
the PM error distribution to other samples. 

The structure of this paper is as follows. In Section [2] we introduce the quasar sam- 
ples and the criteria for selecting reliable PM. We then fit the PM error distribution by 
parametrized analytic functions of both complex and and simplified forms in Section [3l and 
calculate the significance of measured PM in Section HI We summarize our results and discuss 
the issues related to using them in Section O 



2. Quasar Sample Selection 

Our goal is to select a quasar sample with as few contaminants as possible {e.g. stars 
and other objects with intrinsic non-zero PM), while maintaining a sample size large enough 
for statistical analysis, and at the same time covering the largest possible magnitude range to 
allow studying the dependence of PM error o n brightness. There ar e fou r quasar samples i n 
the recent literature which we have lo oked at: iRichards et al.l (120091) andlBovy et al.l ( I2OIII ). 
which are photometric samples, and JBond et al.l ( I2OIOI ) and ISchneider et al.l ( I2OIOI ). which 
are spectroscop ic samples. We ex pect that these samp l es ove rlap with each other on most 
part (especially iBond et al.l (I2OIOI ) and ISchneider et al.l fcOlOl )). but the non-overlaping part 
still makes significant difference on the cleanness of the samples, as we will show. For each 
sample, we obtain the PM and photometry from the SDSS DR7 Catalog Archive Server table 
propermotions. We use the DR7 values in preference to the DR8 values both because of a 
known error in the generation of the DR8 proper motions, and because the photometry in 
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the new DR8 reductions has been less well checked. The DR8 PM p roblem is discussed on 
the DR8 website and in ISDSS-III collaboration: Aihara et al.l ( 120111 ). and will be repaired 
in DR9. 



We use the following criteria (JKilic et al.ll2006l ) to determine a clean PIVU: 



1. match = 1 

2. sigRa < 525 

3. nFit = 6 

4. dist22 > 7 



sigDec < 525 



where match is the number of objects in USNO-B which matches a SDSS object within a 
1 arcsec radius, sigRa and sigDec are the rms residuals for the proper motion fit in right 
ascension and declination, nFit is the number of detections used in the fit including the SDSS 
detection (so nFit = 6 requires that the object was detected on all five USNO-B plates plus 
one for SDSS), and dist22 is the distance to the nearest neighbor in SDSS with g < 22. 
We rejected PM entries which violate any of the above conditions. Specifically, for our 
main quasar sample S-Schneider (see below), the fraction of objects which violates each cut 
condition is: 8.2% for nFit=4 (the minimum nFit in Munn et al. catalog), 17.3% for nFit=5, 
0.5% for match ^ 1, 2.9% for sigRa > 525 or sigDec > 525 , and 5.6% for dist22 < 
7. The sample selection completeness (defined as the ratio of the number of objects which 
survive the cut over the total number of objects) as a function of the gf-band magnitude 
(the six g bins in Tabled]) for S-Schneider is shown in Figure [H The selection completeness 
saturates at the bright end (~ 90%), and drops rapidly with decreasing magnitude. The 
selection completeness curves for several samples of stars are shown in the same plot for 
comparison. We pick out all spectroscopic stars in SDSS DR7, bin them into three color 
bins {g — r > 1.0, 1.0 > g — r > 0,g — r < 0), and carry out the same cleaning process 
to obtain the selection completeness as a function oi g. As the color becomes redder the 
completenes for stars deceases at the bright end while increases at the faint end. 

We note here that the way in which w 'e trim the catalog for dean PM is different from 
the widely used standa rd procedure (e.g. iBond et al.ll2010l ) which is recommended in the 
original sample design (JMunn et al.ll2004l ). where it only requires match = 1 and sigRa < 
350 && sigDec < 350. Considerations of various selection criteria show that the standard 
PM selection criteria return a much less clean quasar PM distribution, which has significantly 
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more quasars with spurious large PM values. Specifically, the important new condition nFit 
= 6 makes the cleanness of o ur samples much b etter than that of samples defined using the 
standard criteria defined by iMunn et al.l (12004 ). even when we loosen the requirement on 
rms fitting residuals a little bit (see Table [Hand discussion below). We would like to stress 
that the functional forms which we derive later in this paper for the PM error distribution 
can only be applied to clean PM samples as defined above. 

We now use the several SDSS quasar catal ogues from the l iterat ure to define samples 
for the analysis of the PM error distribution. [Richards et al.l ( l2009l ) selected ~ 1.2 mil- 
lion photometric quasar candidates from SDSS DR6 based on a Bayesian selection method 
employing the kernel density estimate (KDE) of the probability density function. The i mag- 
nitude range for t his sample is ^ 1 7 — 2 1. After selecting all the objects which passed their 
selection criterion, [Richards et al.l (|2009[ ) flagged the most likely contaminants by assigning 
a good index to every object. This index starts at for each object, then is incremented or 
decremented based on a set of rules. In the end, every object is assigned a good index in the 
range of [-6,6], with larger positive values meaning a larger likelihood of being a quasar. For 
our purpose, we back out the part of the good index determination which makes use of the 
PM, and assign each object a new good' index which does not contain any PM information. 
We then choose objects with the new good' > 3 so that we obtain a clean sample (hereinafter 
S-Richards) while retaining enough objects to carry out the analysis. 



Bovy et al.l ( 120111 ) generated a SDSS quasar targeting catalog by assigning a probability 



as a star, low-redshift [z < 2.2), medium- redshift, or high-redshift [z > 3.5) quasar for 
~ 160 million point sources with dereddened i-band magnitude between 17.75 and 22.45 in 
SDSS DR8. They did this by modeling the distributions of stars and quasars in flux space 
down to the SDSS flux limit by applying the extreme-deconvolution method to estimate 
the underlying density of each class as a function of magnitude, and then convolved the 
densities with the flux uncertainties to assign to each object the probability of its being 
a quasar. We select a subsample (here after S-Bovy) with available clean PM, and reject 
objects with good! =0 (objects fail on some of the BOSS flag cuts), Photometric=0 (objects 
were observed under bad imaging condition) and quasar probability < 99% for all three 
categories. 



Bond et al.l ( l2010l ) selected 69,916 spectroscopic quasars from SDSS DR7 with 14.5 < 



r < 20 and 0.5 < z < 2.5. We repeat the selection to define the sample S-Bond using the 
above improved criteria for finding clean PM. 



Schneider et al.l (120101 ) produced the spectroscopic quasar catalog for SDSS DR7, which 



contains 105,783 spectroscopically confirmed quasars with luminosities larger than Mi = 
—22.0. This catalog has been visually inspected, has highly reliable redshifts from 0.065 
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to 5.46, and contains quasars fainter than i ^ 15. We select objects with clean PM from 
their catalog to form the sample S-Schneider. Again, we expect a large overlapping between 
S-Schneider and S-bond. Furthermore, to show the differences on the resulting samples due 
to different clean PM conditions, we select an additional sample S-Sc hneider-W from the 



Schneider quasar catalog with the recommended clean PM conditions in iMunn et al.l (J2004J ) 
(match = 1 and sigRa < 350 && sigDec < 350). 

Note that in all these samples, an object can have a large measured proper motion for 
two reasons. First, it could in fact has a real large proper motion, and is therefore presumably 
not, in fact, a quasar but a white dwarf or other peculiar star. These sneak through even in 
the spectroscopic samples (as an example, the object plate=1642, mjd=53115, fiber=81 
is included in S-Schneider, but actually it is a white dwarf). Second, the proper motion 
measurement has occasionally failed for some reason, such as mismatches with USNO-B or 
bad deblends. Clearly the assignment of a proper motion error to the latter category or to 
the 'tail' of the main distribution is a bit subjective, but in fact for our samples is pretty 
clear, as we shall see. 

The total PM (PM =A/pml^ + pmb^, where pml and pmb are the longitudinal and latitu- 
dinal components of PM in the Munn et al. catalog, where pml contains the factor of cos b) 
distributions at 6 g-hand bins for the four samples cuter by our strict clean PM criteria are 
shown in Figure [2], where the magnitude is the psf magnitude without extinction correction. 
The statistical information for all the five samples is listed in Table [H including the number 
of objects with PM> 10 mas yr~^(the commonly assumed 3a uncertainty), and PM> 30 
mas yr~^ (defined as the "outliers", see below). We note that comparing with S-Schneider, 
S-Schneider-W is about 25% larger in size, but contains over an order of magnitude more 
objects with PM> 30 mas yr^^. These objects, which lie in the tail of the distribution, 
are almost certainly due to either contamination or failed PM measurement, as discussed 
above. Given the high quality of the eyeball-inspected Schneider catalog, very likely it is 
the latter which makes the most contribution. Again, in order to excluding these failed PM 
measurements as likely as possible, a strict set of clean PM conditio ns as ours should be 



applied instead of the original recommended one in iMunn et al.l (120041 ). 



Among the four samples resulting from our good PM criteria, in general S-Richards 
(slightly better) and S-Schneider are the two cleanest, with fewer than 0.03% of the objects 
having PM > 30 mas yr~^. That is as expected, since the spectra in the S-Schneider sample 
have been visually inspected and verified to have quasar-like spectra, and the high good' index 
objects in S-Richards have been cross matched with a lot of quasar-related information. In 
fact, in the magnitude range covered in this study, the samples overlap almost completely. 
On the other hand, S-Bovy performs as well as the previous two samples except at the very 



high PM end, while S-Bond has a substantially larger fraction of contamination by spurious 
high proper motion values. This is due to the fact that its selection only relies on the catalog 
spectroscopic redshift, which occasionally produces a false measurement by the spectroscopic 
pipeline, while the S-Schneider sample spectra were visually examined. The size of the four 
samples cutted by our strict good PM conditions varies from 50,375 (S-Richards) to 66,658 
(S-Schneider), while S-Bovy has too few objects at the bright end for statistical use. For our 
purpose of studying the PM error distribution, we choose S-Schneider as the main sample 
for analysis, and also study S-Richards for comparison. 



3. The distribution of quasar PM 

In this Section, we fit the total PM distribution in each g magnitude bin of our quasar 
samples by analytic expressions, and investigate the dependence of the fitting result on 
magnitude. We use the magnitude uncorrected for extinction, since the errors should depend 
only on the apparent brightness of the source. 

Equat ion [1] gives the core -|- wing function that we use to fit the quasar PM distributions: 



f{p\A,a,B,a,P,c) 



A— -e 5^ 



+ Bp'^e 



^ r.rr ' 



(1) 



We here and in what follows assume that the proper motion errors are isotropic; we will 
return to this assumption below. We use a 2D Gaussian function to model the central part 
of the distribution, where p is the proper motion, A is the amplitude of the Gaussian core, 
and a is the width. For the wing part of the distribution, we tried various fitting functions, 
eventually finding that the second term in Equation [T] provides an adequate fit. 5 is the 

Table 1. Sample statistics 



Number of objects in each category 


S-Richards 


S-Bovy 


S-Bond 


S-Schneider 


S-Schneider-W 


g < 18.0 


3962 


89 


3893 


4833 


5131 


18.0 <g < 18.5 


6371 


2994 


7193 


8061 


8504 


18.5 <g < 19.0 


13397 


10785 


16076 


17465 


18794 


19.0 <g < 19.5 


17862 


17252 


22247 


24012 


27549 


19.5 <g< 20.0 


6711 


16741 


7846 


9221 


13459 


20.0 <g< 20.5 


2072 


7604 


1434 


3066 


10075 


Total 


50375 


55465 


58689 


66658 


83512 


PM> 10 mas yr"! 


1535 


1900 


2623 


1980 


4180 


PM> 30 mas yr^^ 


12 


49 


296 


17 


227 


PM> 50 mas yr^^ 


1 


24 


214 


2 


70 
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amplitude, a and /3 are two indices, and c is a constant. With values of both indices close to 
1, this wing function decays exponentially at large PM, as suggested by Figure [2l and is well- 
behaved at small PM with little effect on the gaussian core. For all magnitude bins, we use 
PM in the range of — 30 mas yr~^ for fitting. Extrapolation to larger proper motion values 
is well-behaved, though even in the cleanest samples the few objects at larger values have 
flatfish distributions and are almost certainly either contaminants or failed measurements. 
Thus we can say little about the distribution beyond 30 mas yr~^, except to quote maximum 
probabilities for whatever causes a measurement error this large or larger to occur under the 
assumption that all the outliers are due to measurement error, which may well be true for 
S-Schneider. The fractions of these outliers in each g bin are listed in Table [2j 

We minimize the total x^ to fit the PM distribution at each bin to find A, a, 5, a, /3, and 
c. We then normalize the distribution function to get the probability function by replacing 
A and B with a and h to make the integral unity (/q°° /(p|a, a^ b, a, (3, c)dp = 1). We plot the 
fitting results in Figure |3] for S-Schneider (red curves), and list the value of the parameters 
and the x^ statistics in Table [2] for S-Schneider and S-Richards (first two sets of rows). 
The fits are generally good, with normalized x^ iii the range of 0.66-1.5 for both samples. 
The fitted parameters for the two samples are similar to each other, which indicates the 
robustness of the fitting function. (But remember that the two samples are not by any 
means independent.) 

It is clear that several of the parameters do not change very much with magnitude, so 
we also conducted experiments in which we freeze the values of some of these, and refit the 
distributions with this reduced freedom. The third set of rows in Table [2] shows the best result 
from this exercise, where we fix a = 1.0, (3 = 1.0, c = 0.9, define a normalized tail amplitude 
b = 0.035 (and a corresponding normalized gaussian amplitude a = 1 — 6(ccr)^). This makes 
the normalized probability function a one-parameter function of a alone; (J„ f{p\o')dp = 1). 
Figure El shows these results (green curves), where it is clear that the difference between the 
full-freedom and the reduced-freedom fitting is small in all the magnitude bins. The total 
X^ only moderately increases with the new fitting function (the normalized x^ even drops in 
some cases due to an increase in the degree of freedom), and the largest increase occurs at 
the 19.0 < g < 19.5 magnitude bin (which also has the largest normalized x^ (1-61) and the 
largest population, so is most sensitive to small inadequacies in the fitting function). We 
show the detailed x^ map for this case in Figure HI The x^ af each PM bin are uniformly 
scattered around 1, and the accumulated x^ behaves well. The normalized one-parameter 
fitting function is: 

f(p\a) = (1 - 0.035(0.9a)2)^e"^ + 0.035pe-(ofc) (2) 
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Table 2. Fitting parameters and statistics 



Sample 


g range 


yave 


Jo 


a'' 


cr 


6d 


a 


/3 


c 


x' 


Normalized x^ 


S-Richards<= 


g < 18.0 


17.52 


0.000% 


0.92 


2.48 


0.020 


0.51 


0.88 


0.97 


43.2 


0.80 




18.0 <g< 18.5 


18.28 


0.031% 


0.88 


2.53 


0.016 


1.03 


1.03 


0.93 


59.7 


1.11 




18.5 < g < 19.0 


18.78 


0.015% 


0.78 


2.72 


0.038 


0.89 


1.03 


1.01 


67.5 


1.25 




19.0 <g< 19.5 


19.24 


0.022% 


0.71 


2.99 


0.036 


1.10 


1.06 


1.06 


80.9 


1.50 




19.5 <g< 20.0 


19.70 


0.045% 


0.57 


3.50 


0.054 


1.01 


1.00 


1.25 


45.9 


0.85 




20.0 <g< 20.5 


20.19 


0.048% 


0.46 


3.76 


0.039 


1.11 


1.01 


1.10 


40.8 


0.76 


S-Schneider'^ 


g < 18.0 


17.53 


0.021% 


0.83 


2.44 


0.041 


0.77 


1.01 


0.98 


39.4 


0.73 




18.0 <g< 18.5 


18.28 


0.037% 


0.89 


2.53 


0.016 


1.00 


1.01 


1.06 


68.3 


1.27 




18.5 < g < 19.0 


18.78 


0.017% 


0.77 


2.70 


0.037 


0.93 


1.05 


1.00 


81.1 


1.50 




19.0 <g< 19.5 


19.24 


0.021% 


0.67 


2.98 


0.044 


1.14 


1.04 


0.86 


74.6 


1.38 




19.5 < g < 20.0 


19.70 


0.033% 


0.56 


3.38 


0.053 


1.14 


1.01 


0.78 


62.3 


1.15 




20.0 <g< 20.5 


20.20 


0.065% 


0.52 


3.80 


0.036 


1.14 


1.00 


0.86 


35.4 


0.66 


S-Schneider^ 


g < 18.0 


17.53 


0.021% 


0.83 


2.42 


0.035 


1.00 


1.00 


0.90 


40.9 


0.71 




18.0 < 9 < 18.5 


18.28 


0.037% 


0.82 


2.51 


0.035 


1.00 


1.00 


0.90 


74.9 


1.29 




18.5 <g< 19.0 


18.78 


0.017% 


0.79 


2.70 


0.035 


1.00 


1.00 


0.90 


81.7 


1.41 




19.0 <g< 19.5 


19.24 


0.021% 


0.74 


3.01 


0.035 


1.00 


1.00 


0.90 


93.4 


1.61 




19.5 < 9 < 20.0 


19.70 


0.033% 


0.68 


3.38 


0.035 


1.00 


1.00 


0.90 


79.4 


1.37 




20.0 <g< 20.5 


20.20 


0.065% 


0.57 


3.91 


0.035 


1.00 


1.00 


0.90 


38.6 


0.67 


S-Schneider-W^ 


3 < 18.0 


17.53 


0.12% 


0.76 


2.44 


0.056 


0.99 


1.05 


0.90 


41.5 


0.77 




18.0 <g< 18.5 


18.28 


0.15% 


0.85 


2.53 


0.022 


1.05 


1.04 


1.03 


55.2 


1.02 




18.5 <g< 19.0 


18.78 


0.13% 


0.78 


2.75 


0.038 


0.89 


1.01 


0.94 


84.9 


1.57 




19.0 < 9 < 19.5 


19.24 


0.18% 


0.75 


3.08 


0.034 


0.97 


1.00 


0.90 


134.3 


2.49 




19.5 < g < 20.0 


19.72 


0.35% 


0.67 


3.49 


0.031 


0.99 


0.99 


0.93 


110.7 


2.05 




20.0 < g < 20.5 


20.24 


0.87% 


0.46 


4.18 


0.033 


1.12 


1.00 


0.88 


71.6 


1.33 



''Average g magnitude of objects in each bin. 

''Fraction of the outliers (objects with pm > 30 mas yr~^). 

'^Normalized A in Equation [T] 

'^ Normalized B in Equation [T] 

==Fitted by Equation [T] 

'Fitted by Equation [2] with one more free parameter on the overall scale. 
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Due to the limited quasar sample size we can only fit its PM distribution and extract 
the fitted parameters at six g magnitude bins (the average magnitude of each g bin for both 
S-Schneider and S-Richards are hsted in Table [2]). To get the distribution function at some 
arbitrary f^, we fit a quadratic function to the a in the fitting using the one-parameter fits 
(Equation [21 the only free parameter in the normalized function) as a function of g: 

a = 0.2293(^ - 19)^ + 0.6205(^ - 19) + 2.836 (3) 



Figure |5] shows the fitting results, which are very good. iBond et al.l (120101 ) studied the 
PM error distribution using their quasar sample. They fitted a Gaussian profile to the error 
distribution in the entire magnitude range, and obtained a fitting function for a, which is 
shown here as well for comparison (They fitted a as a function of r-band magnitude. To 
make a direct comparison we convert r into g via g — r = 0.18, which is the average value 
for S-Schneider, see below.). In general, their fitted a is larger (by ~ 0.5 mas yr~^) than 
ours in the entire magnitude range, while the two curves share very similar shapes. This 



is fully consistent with the fact that iBond et al.l ( 120101 ) used a less clean quasar sample 



than S-Schneider and didn't separate the tail of the distribution from the Gaussian core, 
both resulting in a larger Gaussian width. Our fitting formula, of course, applies only to 
a finite range in g magnitude, since we have only a finite range over which to determine 
it. We recommend using it in the range of 17.5 < g < 20.5 (the range we plot in the 
figure). For g < 17.5 we recommend using a constant a = 2.42, because for bright objects 
the proper-motion measurement error approaches the instrument induced error limit and 
does not depend on the brightness of the object. For g > 20.5, the fitting function is 
ill-constrained; we have too few quasars to calibrate it, and the core parameters are not 
well- determined . 

Finally, we fit the fraction of the outliers (/o in Table |2]) in S-Schneider as a function 
of g magnitude. Given the small number of outliers (17), this fitting can only indicate the 
general trend of the fo — 9 relation. We approximate the fitting function by: 

fo = 0.00122(10^-2^(^-^9) + 20.1)% (4) 

(Note fo is in unit of %.), and the fitting is shown in Figure |6l As is the case with the 
fitted (J — g relation, we recommend confining the use of this relation to (7 < 20.5 since we 
have insufficient numbers of objects at fainter magnitudes. The extension to the brighter 
magnitudes can probably be trusted, however. 

Before we move to the next section, there are several general issues about the PM error 
distribution that we would like to discuss. 
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1. Since most previous studies whi ch used the SDSS+ USNO-B PM catalog employed 



the original clean PM criteria in iMunn et al.l ( 120041 ) to select objects with good PM 



measurements, here we explore the effect of this weaker PM cut on the error distribution 
by fitting S-Schneider-W using Equation [TJ The result is shown in Table [2] (bottom set 
of rows) . Comparing with the fitting results of S-Schneider (the second set of rows in 
the same table), the original clean PM conditions results in a slightly larger a (up to 
~ 10%), and a generally worse x^ statistics (both are more significant at the faint end). 
The biggest disadvantage of the original clean PM criteria is still, as we discussed above, 
that it introduces a much larger /q. Studies of stellar kinematics which are sensitive to 
the outlier fractions, such as searching for extremely high velocity stars in the Galaxy, 
will be severely affected if this weaker set of clean PM criteria is applied instead of 
ours. 



2. In principle, PM errors could depend on color as well as magnitude; iBond et al.l (|2010[ ) 
found that the systematic errors in PM have a small color dependence, but did not find 
a corresponding dependence for the random errors. Basically, the color range of quasars 
is too small to investigate this possibility. Several color distributions for S-Schneider 
are shown in Figure [71 Stars, on the other hand, have a much wider range of colors and 
their measured proper motions may be subject to color-related errors which cannot be 
investigated with quasar samples. If this is the case, the error functions derived here 
are more applicable to samples of blue stars, such as main-sequence-turnoff samples. 
The POSS positions which enter the proper motion calculation are inverse-variance 
weighted combinations of data from the O, J, E, F, and N plates and thus span a 
very large wavelength range, most of which is to the red of the effective wavelength of 
the SDSS g band, but it is not clear exactly what the effective wavelength is. But we 
probably somewhat overestimate the errors for red objects, because they are brighter 
in most of the photographic bands than typical quasars with the same g magnitude, 
i.e. our result is a conservative limit for red objects (which is reason that we choose 
to investigate the dependence of the PM error distribution on g magnitude instead of 
on redder bands). The mean g — r color in the figure is 0.18, which corresponds to 
middle-late F stars. The sample we are investigating which prompted this study is a 
SEGUE II sample of halo turnoff stars, for which the color match with the quasars is 
(entirely fortuitously) excellent. 

3. There are additional possible contributions to the PM errors, such the way of measuring 
the position of the objects on the sky and observing conditions. The USNO-B positions 
are originally in the coordinate system of the USNO plates and then later transferred 
to the celestial coordinates (right ascension and declination), while the SDSS positions 
are initially measured in the CCD coordinate system, which later are transferred to the 
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survey longitude and latitude which define the photometric scans of the sky, and are 
then calibrated with respect to the ra-dec measurements from UCAC The derivation of 
proper motions from these two sets of measurements has the probable effec t of making 



the er rors in the proper motions more nearly isotropic (See point 3 below). iBond et al. 



(J2OIOI ) looked at the dependence of median and rms errors for the longitudinal and 
latitudinal PM components on position on the sky, and concluded that the variation is 
relatively small (with the median variation being much smaller than a). On the other 
hand, whether the PM error distribution (especially the wing component) depends 
systematically on position on the sky is another question, which again we are not 
able to address due to the limited size of the quasar samples. Note, however, that 
our investigation deals well with the aggregate survey, so questions like the number of 
outliers expected with samples large enough to sample the sky in a manner comparable 
to the quasars should be well answered. 

We investigate the error distribution for the total PM and not for the individual com- 
ponents of the PM, since the error s are likely t o be is otropic and, for many purposes. 



it is the total PM which matters. iBond et al.l ( I2OIOI ) concluded that the correlation 



between the errors in the two components is negligible compared to the total random 
and systematic errors. In addition, we find that the error distributions in the two com- 
ponents are very similar, as shown in Figure [SI For S-Schneider, the median is 0.10 
for pml and —0.17 for pmb, both significantly smaller than the Gaussian width a, and 
the standard deviation is 3.40 for pml and 3.43 for pmb. This inferred near-isotropy of 
the PM errors, however, should be viewed with some caution, as the 1 and b compo- 
nents are related in a very complex and variable way to components either along and 
perpendicular to the scan directions in SDSS, altitude and azimuth, or right ascension 
and declination, for which in any of those cases there might be factors contributing to 
anisotropy. 



In addition to providing the PM measurement, iMunn et al.l (120041 ) also provided a PM 
error estimate (pmraerr and pmdecerr, which represent the expected standard devi- 
ations of the PM measurement around the true value in each direction, but the two 
components are always assumed to be the same in the catalog). Since most previous 
investigations which used the SDSS stellar kinematics to study the galactic structure 
employed the catalog-provided pmraerr and pmdecerr to estimate the uncertainty of 
the PM measurement, it is worth calibrating the performance of this error estimate us- 
ing the true PM error distribution which we get from our quasar samples. We calculate 
the total catalog-provided PM error estimate (PMermr = A/pmraerr^ + pmdecerr^) for 
S-Schneider, and show its statistical distribution in Figure O The distribution is rather 
narrow, effectively ranging from 3 — 6 mas yr~^. In addition, we provide the average 
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pmraerr (or pmdecerr) for each g bin, and over plot it in Figure [5] (from small to 
large Qavg, the values are 2.78, 2.99, 3.18, 3.37, 3.54, and 3.69 mas yr~^). We note here 
although we use a 2D Gaussian function to model the core part, the a in the PM error 
pdf is still a ID Gaussian a {i.e. in one direction, assuming isotropy for the distri- 
bution), so it is pmraerr (or pmdecerr) which should be compared with the fitted a, 
not PMerror- The comparisou shows that the catalog-provided error estimate is in good 
agreement with our fitted a (within 20%), with the former on average being ~ 12% 
higher than the latter (the agreement is better at the faint end). The major drawback 
of the catalog-provided error estimate is that it does not address the non-Gaussian tail 
of the error distribution. In the further, we recommend that analysis using the Munn 
et al. PM, especially the ones for which the tale of the PM error distribution is impor- 
tant, should use our results to estimate the error instead of using the catalog-provided 
errors. 



4. The Significance of the measured proper motions 

With the normalized PM distribution function f{p) for our quasar samples, we can 
calculate the PM error probability function (/(perror) = fip)) of the measured PM (^measured) 
for an object (a star) with intrinsic non-zero PM. Specifically, /(perror)c^Perror is the probability 
that the true proper-motion ptruc = ^measured + Perror falls in an annulus centered on ^measured 
with radius Perror and Perror + c^Perror- (scc Figure [To]). Based on this, we could calculate 
the probability of an object with Pmeasured having an (unknown) true PM ptrue larger than 
some certain value Ptj-uc- ^s shown in Figure [TOl the total probability of ptrue being in the 
shadowed area (ptruc < Ptr 



true T i T) 

J V/'^crrory 



/•Ptr 

i^(Ptruc < Ptruc) = / '"''^''"°' ' pdpdO (5) 

Jo ATT p error 

Wniie Perror ^ Ptrue Pmeasured SO Porror ^ V Pmeasured ' Ptrne 2pmeasuredPtrue COS C7. i lie 

above integral turns into: 

p't 



/(V^^asured + P^ ' 2pmeasuredP COS I 
2VPmcasured + P^ " 2pmeasuredP COS 9 " 



i^(Ptrue < pLe) = / ^^V ^—-'^ '^^ Z '. Vdpde (6) 

measured 



Thus -F(ptrue > Ptrue) = ^ ~ -^(ptrue < Ptrue) is the probability that Ptrue falls outside the 
shadowed area, i.e. the object in consideration has a true PM larger than the given threshold 
p'true- Panel (a) in Figure [TT] shows the calculation of -F(ptrue > Ptrue) ^^ two measured PM at 
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two g magnitudes (18.28 and 19.70). Based on this, we define a series of confidence intervals 



(Ptruc.ia, Ptrue,2a, and Ptrue.sa) to be tlie p[^^^ Corresponding to F(ptr 
and 99.7%. Panel (6) in Figure ITT] shows these confidence intervals. 



>Kr 



68%, 95%, 



Here we use an example to illustrate the effect of the non-Gaussian tail in the PM error 
distribution in determining the confident velocity of a star with some measured PM. For a 
star with measured PM of 30 mas yr~^ (corresponding to a transverse velocity of 423 km s~^ 
at a helio-centric distance of 3 kpc) and o" of 4 mas yr~^, if the PM error distribution only 
contains the Gaussian core {e.g. f{p\cr) = {p/a'^)e~^ 1^'^" ^), the 2a confident PM would be 
23.7 mas yr~^(334 km s~^) and the 3(T PM= 19.3 (272 km s~^). On the other hand, when 
including the non-Gaussian tail, the 2a and 3cr confident PM decrease to 22.6 and 13.8 mas 
yr^^(319 and 195 km s~^. When combining with a model of the Galactic potential and a 
radial velocity, these differences in confident velocity may flip the conclusion that whether 
this star is bonded to the Galaxy or not. In general, the effect of the non-Gaussian tail is 
more prominent at the faint end, where the Gaussian width is larger and the weight of the 
non-gaussian tail is bigger. 

Lastly, we note that in this section all the calculations are done with the one-parameter 
error probability function /(pormr) = /(p|cr(5')) (Equation |2] and Table [2]), but it is a triv- 
ial exercise to replace /(p|cr) with /(p|a, a, 6, a, /3, c) (Table |2]) if more accurate results are 
desired. However, in doing so one needs to interpolate the parameters at the six discrete g 
magnitudes to get the result at some arbitrary g. 



5. Summary and discussion 

We have investigat ed the proper-mo tion measurement errors in the SDSS+USNO-B 
proper-motion catalog (JMunn et al.l 120041) by analyzing the proper-raotiori distributions of 



several recent SPSS qu asar samples ([Richards et al. 



2010l : lBovy et al.ll201ll ). The sample defined by lSchneider et al 



20091: 



Bond et al. 



2010l : [Schneider et al. 



((20101) was determined to be 



the cleanest sample and has the largest sample size. We bin the data into six g magnitude 
(not extinction corrected) bins, and fit analytic functions for the PM distribution in each bin. 
We find that, while a six-parameter fitting formula (Equation [1]) describes the quasar PM 
distribution well, a simpler (normalized) function with one free parameter (Equation [2]) also 
gives reasonably good results. Based on this fitting function for the PM error distribution, 
we calculate the probability that an object with a measured PM has a true PM ptme larger 
than a given threshold p[rue- Cutting the probability -F(ptrue > Kme) at several confidence 
levels, we calculate the "most likely PM" for a given measured PM. 



-16- 



The analysis raised several issues which we would like to stress: 



1. Our PM error analysis can only be applied to clean PM subsamples from the SDSS+USNO- 
B catalog, which satisfies our strict clean PM criteria (match = 1, sigRa < 525 && 
sigDec < 525, dist22 > 7, and most importantly, nFit = 6, which requires that the 
object was detected on all five USNO-B plates). Expe riments show that weakening 



this set of conditions (specifically, the original criteria in lMunn et al.l (120041 ): match = 
1 and sigRa < 350 && sigDec < 350) generally results in a broader Gaussian core 
width (up to 10%), a worse x^ statistics in the fitting, and a significantly higher fraction 
of the outliers (by an order of magnitude). Studies which are sensitive to the tail of 
the PM error distribution and the fraction of outliers, such as searching for extremely 
high velocity stars in the Galaxy, will be severely affected if using this weaker set of 
conditions instead of ours. 

2. The PM error distribution derived here applies to g magnitudes (without extinction 
correction) in the range ~ 17 — 20.5. Specifically, the fitting functions Equation [2] 
and [3] are only good for 17.5 < g < 20.5. Though using a constant a for brighter 
objects is probably appropriate, using a constant or extrapolating our results for fainter 
magnitudes is not (Section [3]). 

3. While we have derived a fitting function for the entire PM error range, we note that 
there is a small fraction of "outliers" (defined here to be PM error > 30 mas yr^^), which 
apparently do not belong to the derived error distribution. Even for our cleanest q uasar 



sample, which has passed spectroscopic visual inspection ( ISchneider et al.l 120101 ) . and 
with our very conservative PM culling conditions, there is still a small number of 
outliers, up to ~ 0.1%, in the faintest magnitude bands. We do not speculate on the 
origin of these, but they are likely present in any sample chosen by our criteria. 

4. While we fit the PM error distribution as a function of magnitude, the readers should 
bear in mind that the PM error may depend on other parameters as well, such as color 
and position of the objects on they sky. The PM error distribution in this work is 
best applicable to objects in the color range similar to the quasar sample we use to 
get the distribution (i.e., blue stars. Fig. [7]), though our result could be considered 
as a conservative limit for redder objects. While the median and rms for the PM 
error only weakly depend on the position on the sky, we do not have a large enough 
sample to investigate the dependence of the distribution on the position on the sky. 
The distributions of the PM error components in the longitudinal and latitudinal are 
very similar to each other (Fig. [H]), and the correlation between the two is negligible 
compared to the total random and systematic errors. 
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5. Last, we note that the PM error estimate provided by lMunn et aLl (120041 . pmraerr and 
pmdecerr) is in rough agreement with the a fitted from the quasar PM distribution 
(Fig. E]). The agreement is within 20%, and better at the faint end. On the other hand, 
the major drawback of the catalog-provided error estimate is that it does not address 
the non-Gaussian tail. We recommend that future analysis using Munn et al. proper 
motions, especially the ones for which the tail of the error distribution is important, 
should use this work to estimate the error instead of using the catalog-provided error 
estimate. 

This investigation of the PM error are useful in many ways. For example, we are 
studying the kinematics of stars in SEGUE II, focusing on these hyper velocity ones with 
velocity exceeding the escape velocity of the Galactic potential. In this case, it is crucial to 
have an accurate PM error distribution to judge the significance of the hyper velocity for 
these escapers. 
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Fig. 2. — The PM distribution for S-Richards (upper left), S-Bovy (upper right), S-Bond 
(lower left), and S-Schneider (lower right), in six bins of g magnitude (uncorrected for ex- 
tinction). The arrows indicate that the last bin contains all the PM > 50 mas yr^^ objects. 
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Fig. 3. — The fitting results for S-Schneider in six bins of g magnitude (uncorrected for 
extinction). The red curves show the full six-parameter function (Equation [T]) , while the 
green curves show the two-parameter approximation (a = 1.0, /3 = 1.0, c = 0.9, Equation [H 
and normalized h = 0.035). The dash-dotted lines are the core of the distribution (first 
term in Equation [Hand [2]), the dashed lines are the wing of the distribution (second term of 
Equation [T] and H]) , and the solid lines are the combination of the two. 
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Fig. 4. — (a): The x^ ^^ each PM bin for the freedom reduced fitting for the 18 < g < 18.5 
bin in S-Schneider. (b): the accumulated x^ ^^ this case (the degree of freedom is 58). 
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Fig. 5. — The value of a in the approximate distribution (Equation |2]) of the quasar PM in S- 
Schneider as a function of average g magnitude (sohd point, the third set of rows in Table E]), 
and the correspoi iding fitting functi on (solid line, Equation |3]) . The dashed line shows the 
a fitting result in iBond et al.l (l2010f ) (their Equation 1. We convert the r magnitude into g 
magnitude hj g — r = 0.18, the average value for S-Schneider.). The open circles are the 
average catalog-provided PM error estimate in Munn et al. catalog (pmraerr or pmdecerr) 
for each g bin. 
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Fig. 7. — Histograms of colors (no extinction correction) for S-Schneider. 
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Fig. 8. — The distribution of PM in each component (pml and pmb) in the Munn et al. 
catalog for S-Schneider. 
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Fig. 9. — The distribution of the PM error estimate (PMj, 
in the Munn et al. catalog for S-Schneider. 
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Fig. 10. — Schematic plot which shows how we calculate the probability -F(ptrue < Ptrue)- 
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Fig. 11. — (a): The calculated probability -F(ptrue > Kme) (Section Hj) as a function of p[j.ue 
for two measured PM at 10 and 30 mas yr~^. (b): Three confidence levels for the true PM 
(Section H]) as a function of measured PM. In both plots thick lines are for g = 18.28, and 
thin lines are for g = 19.70. Here 1, 2, and 3 a refer to probabilities of 68%, 95%, and 99.7% 
that the true PM exceeds the plotted value. 



